Co-clustering through Optimal Transport
نویسندگان
چکیده
In this paper, we present a novel method for co-clustering, an unsupervised learning approach that aims at discovering homogeneous groups of data instances and features by grouping them simultaneously. The proposed method uses the entropy regularized optimal transport between empirical measures defined on data instances and features in order to obtain an estimated joint probability density function represented by the optimal coupling matrix. This matrix is further factorized to obtain the induced row and columns partitions using multiscale representations approach. To justify our method theoretically, we show how the solution of the regularized optimal transport can be seen from the variational inference perspective thus motivating its use for co-clustering. The algorithm derived for the proposed method and its kernelized version based on the notion of Gromov-Wasserstein distance are fast, accurate and can determine automatically the number of both row and column clusters. These features are vividly demonstrated through extensive experimental evaluations.
منابع مشابه
Supplementary material for the paper: Co-clustering through Optimal Transport
For the sake of completeness, we first present the Sinkhorn’s theorem and explain how it was used to derive the solution of the regularized optimal transport. Theorem ((Sinkhorn & Knopp, 1967)). If A is an n × n matrix with strictly positive elements, then there exist diagonal matrices D1 and D2 with strictly positive diagonal elements such that D1 ∗ A ∗ D2 is doubly stochastic. The matrices D1...
متن کاملCo-clustering Spatial Data Using a Generalized Linear Mixed Model With Application to the Integrated Pest Management
version on a funder's repository at a funder's request, provided it is not made publicly available until 12 months after publication. Co-clustering has been broadly applied to many domains such as bioinformatics and text mining. However, model-based spatial co-clustering has not been studied. In this paper, we develop a co-clustering method using a generalized linear mixed model for spatial dat...
متن کاملEnergy Saving in Kiln Unit of of ABYEK CEMENT CO: Data Clustering Approach
Cost of cement producing all over the world depends on to the level of wages, energy cost and availability of raw materials. By investigating financial statements of various companies at the stock market, the share of electrical and fuel costs are nearly 27 percent of total costs and this plays the important role in right management of energy consumption. In this regard mathematics modeling and...
متن کاملIdentifiability of Nonparametric Mixture Models and Bayes Optimal Clustering
Motivated by problems in data clustering, we establish general conditions under which families of nonparametric mixture models are identifiable by introducing a novel framework for clustering overfitted parametric (i.e. misspecified) mixture models. These conditions generalize existing conditions in the literature, and are flexible enough to include for example mixtures of Gaussian mixtures. In...
متن کاملConvex Clustering via Optimal Mass Transport
We consider approximating distributions within the framework of optimal mass transport and specialize to the problem of clustering data sets. Distances between distributions are measured in the Wasserstein metric. The main problem we consider is that of approximating sample distributions by ones with sparse support. This provides a new viewpoint to clustering. We propose different relaxations o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017